Eliminating small cells from census counts tables: empirical vs. design transition probabilities
نویسندگان
چکیده
The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006). MSC: 62Q05 ”Statistical tables”
منابع مشابه
Statistical Disclosure Control Methods for Census Frequency Tables
This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...
متن کاملNot for Citation or Quotation Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies
Even in the age of electronic dissemination of statistical data, tables are central data products of statistical agencies. For prominent examples, see the American FactFinder (http://factfinder.census.gov/servlet/BasicFactsServlet) from the U.S. Bureau of Census, the Office of National Statistics (http://www.statistics.gov.uk/) in the U.K., and Statistics Netherlands (http://www.cbs.nl/en/figur...
متن کاملArea specific confidence intervals for a small area mean under the Fay-Herriot model
‎Small area estimates have received much attention from both private and public sectors due to the growing demand for effective planning of health services‎, ‎apportioning of government funds and policy and decision making‎. ‎Surveys are generally designed to give representative estimates at national or district level‎, ‎but estimates of variables of interest are oft...
متن کاملEffects of visualizing statistical information – an empirical study on tree diagrams and 2 × 2 tables
In their research articles, scholars often use 2 × 2 tables or tree diagrams including natural frequencies in order to illustrate Bayesian reasoning situations to their peers. Interestingly, the effect of these visualizations on participants' performance has not been tested empirically so far (apart from explicit training studies). In the present article, we report on an empirical study (3 × 2 ...
متن کاملBayesian Disclosure Risk Assessment: Predicting Small Frequencies in Contingency Tables
We propose an approach for assessing the risk of individual identification in the release of categorical data. This requires the accurate calculation of predictive probabilities for those cells in a contingency table which have small sample frequencies, making the problem somewhat different from usual contingency table estimation, where interest is generally focussed on regions of high probabil...
متن کامل